Algorithmic hospital catchment area estimation using label propagation

Background Hospital catchment areas define the primary population of a hospital and are central to assessing the potential demand on that hospital, for example, due to infectious disease outbreaks. Methods We present a novel algorithm, based on label propagation, for estimating hospital catchment areas, from the capacity of the hospital and demographics of the nearby population, and without requiring any data on hospital activity. Results The algorithm is demonstrated to produce a mapping from fine grained geographic regions to larger scale catchment areas, providing contiguous and realistic subdivisions of geographies relating to a single hospital or to a group of hospitals. In validation against an alternative approach predicated on activity data gathered during the COVID-19 outbreak in the UK, the label propagation algorithm is found to have a high level of agreement and perform at a similar level of accuracy. Results The algorithm can be used to make estimates of hospital catchment areas in new situations where activity data is not yet available, such as in the early stages of a infections disease outbreak. Supplementary Information The online version contains supplementary material available at (10.1186/s12913-022-08127-7).


Supplementary materials
The algorithm requires firstly an estimate of demand, for this we used population counts, secondly a geographical network and thirdly an estimate of supply, in this case hospital capacity data.

Estimating surge hospital capacity in Britain during the COVID-19 pandemic
Identifying a set of capacity data for the NHS proved complex. After several attempts to integrate data from various sources, we ultimately performed a manual curation of the sources listed below, with gaps or inconsistencies filled in by consultation with the relevant hospital's website. The resulting list is a snapshot in time of capacity and not representative of up to date practice. During the course of the COVID-19 pandemic a small number of NHS trusts merged which had to be manually adjusted for. There are also significant limitations due to the different ways the devolved administrations of the UK (England, Wales, Scotland and Northern Ireland) reported situation report of bed capacity during the pandemic, which meant only England and Wales hospitals has assessments of surge capacity, and we had no reliable information about Northern Ireland at all, and hence it was excluded. This does not significantly alter our conclusions here about the nature of the algorithm, but should be borne in mind, if the data set is to be used for other purposes.

NHS and Trust GIS locations (England):
• https://www.nhs.uk/about-us/nhs-website-datasets/ • Lists of independent and NHS hospitals and trusts with location data • public

Sitrep (Situation reports) data:
England: • filename: Covid sitrep report incl CIC 20200408 FINAL.xlsx • Acute and ICU beds available in England at site level • ICU (SIT032) and HDU (SIT033) beds available -many data quality issues and missing trusts • restricted Wales:

Characterisation of misclassification
In Supplementary Table 1 we qualitatively examine the ten NHS Trusts that have the highest number of ITU patients that the label propagation algorithm predicted to be admitted elsewhere, and mis-classified them. These represent 1833 (38.7%) of the total mis-classifications. The majority of these 10 hospitals are major tertiary referral intensive care units, or specialist centres, as demonstrated by them being in the top quintile of NHS trusts by ITU bed capacity. This result is consistent with both the possibilities that severely ill patients may end up in specialist centres rather than their closest hospital for treatment, or that in the event of a large surge in cases, patients may overflow from smaller to larger intensive care units. Both of these could lead to mis-classification of these patients by the label propagation algorithm, as we see here.
Supplementary In Supplementary Table 2 we look at the trusts where there are fewest cases incorrectly assigned to other trusts by the label propagation algorithm. Although these are generally the smaller intensive care units this is not globally the case. This is just a measure of type 1 error and could be the result of an inappropriately large catchment area.
Supplementary The distribution of patients who attended hospitals with the fewest misclassication errors is shown in Supplementary Figure 1 and these tends to be the intensive care units with fewest attendees, with a few long distance out of area patients. The distribution of patients who attended hospitals with the most misclassification errors is shown in Supplementary Figure 2 and these tends to be the intensive care units with many attendees, spread over much wider areas than the algorithm predicts. These are typically large intensive care units based in dense towns, where there are many other hospitals. A limitation of the label propagation algorithm is that as tertiary referral centres, these hospitals catchment areas for ITU services are probably different in nature from those of the surrounding smaller hospitals. In this case a two-layered approach to the catchment area may be more appropriate, where one layer considers wider tertiary referral and the other locally directly admitted patients who will tend ot be more local.